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Abstract - This paper reports a VLSI implementation of the CCSDS standard 
Reed Solomon encoder circuit for the Space Station. The l.O^m double metal 
CMOS chip is 5.9mm by 3.6mm, contains 48,000 transistors, operates at a sus- 
tained data rate of 320 Mbits/s and executes 2,560 Mops. The chip features a 
pin selectable interleave depth of from 1 to 8. Block lengths up to 255 bytes 
as well as shortened codes are supported. Control circuitry uses register cells 
which are immune to Single Event Upset. In addition, the CMOS process used 
is reported to be tolerant of over 1 Mrad total dose radiation. 


1 General Description 

This chip implements an encoder for the CCSDS standard (255,223) Reed Solomon (RS) 
code [1], An RS code is a cyclic symbol error correcting code for correcting errors in- 
troduced into data during transmission through a communication channel. The CCSDS 
standard is a 16 symbol error correction code. The code block consists of 223 information 
symbols and 32 parity symbols. Each symbol is an 8 bit word. Due to the flexible nature 
of the algorithms being implemented, the circuit will support the encoding of shortened, 
as well as full length RS codes. Specifically, the codes which are supported are of the form: 
(255 — t,223 — i), where i can be any integer from 0 to 222. 

The code is defined over the finite field GF( 2 8 ). The field defining primitive polynomial 
is: 


p(x) = i 8 + x 7 + x 2 + x 1 + x° (1) 

The generator polynomial is given by: 

143 

ff( x ) = II ( x ~ F) ( 2 ) 

>=113 

where /3 - a 11 . 

The encoder represents data in the dual basis such that 

[^Oj ^1 j • • • , ■Z-t] = [^7, We, • ■ • , Wo] F 


( 3 ) 
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where [ z 0 , z x , . . . , z 7 ] is the symbol represented by the dual basis, [u 7 ,u a , . . . ,u 0 ] is the 
symbol represented by the normal basis and T is the following transform matrix: 
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(4) 


Normal data can be derived from data represented in the dual basis using the following 
inverse transform. 


[u 7 ,u a ,. .. ,u 0 ] = [z 0 ,z u . ..,z 7 ] T 1 

where 


(5) 


T ~ 1 = 


1 1 0 0 0 1 0 1 
0 1 0 0 0 0 1 0 
0 0 10 1110 
1111110 1 
1 1 1 1 0 0 0 0 
0 11110 0 1 
10 10 110 0 
110 0 110 0 


( 6 ) 


A dual field is simply a different representation of the original field. The coefficients of 
g(x) are linear operators. An operator O in the original representation of the field can be 
used in the dual representation by applying the following transform. 


O dual 


TOoriginalT 1 


(7) 


Additional details of the mathmatics can be found in [2]. 

The coder circuit has data input and output ports. Data is input in a byte serial 
fashion at a constant rate, and is output in a byte serial fashion with a fixed one clock 
cycle latency. After the information bytes have been output, the 32 bytes of RS parity are 
appended to the data stream. The data rate for the chip is 40 Mbytes/sec when clocked 
at a rate of 40 MHz. 

The encoder can be programmed to interleave the data at depths of one, two, ... or 
eight. Interleaving of two or more encoded messages allows higher burst error correction 
capabilities. The interleaving depth, J, is controlled by external pins, So, Si and 
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2 Chip Operation 

2.1 Initialization 

Before proper circuit operation can begin, the encoder sections must be initialized and the 
interleave depth chosen. This is accomplished by bringing the reset input (RST) high for 
at least two clock pulses and setting the interleave depth control lines, So, Si and S 2 , to 
the appropriate state. 
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At this time, it is also necessary to bring the input control (INC) inactive low to ensure 
no spurious messages are processed. Two clock pulses after RST is brought low, circuit 
operation may commence. The circuit may be re-initialized at any time but any messages 
being processed by the encoder section at that time will be lost. Zeros are clocked into the 
parity generator whenever INC is low. 

2.2 Encoder Operation 

Assuming the initialization sequence has been performed, encoding is performed in the 
following manner. INC is brought high coincidentally with the first message symbol to be 
encoded. It remains high while successive message symbols are clocked into the encoder 
on the data input bus (DI). Symbols are clocked in and out of the circuit on the rising 
edge of the symbol clock (CK). 

INC is brought low again when the last message symbol has been clocked into the 
circuit. It must remain low at least 327 clock cycles, during which time the parity symbols 
will be clocked out of the circuit. This operation also fills the parity generator with zeros. 
If INC is held low longer than 327 clock cycles, zeros will appear on the the data output 
bus (DO). Bringing INC high after it has been low for 327 or more clock cycles starts the 
processing of the next message. 

2.3 Bypass Operation 

After a reset operation or after a block has been encoded and the parity read from the 
chip, a data bypass operation can occur. Data can flow through the encoder without being 
encoded by bringing the bypass input control (BIC) high coincidentally with the first byte 
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of data to be passed unprocessed by the chip. After the one clock cycle latency, the data 
entering on DI appears on DO and continues to pass through the chip as long as BIC 
remains high. While BIC is high, INC should be held low to keep the registers in the 
parity generator held reset. 

2.4 Space Enhancement Features 

The CMOS fabrication process used is reported to be tolerant of total dose radiation 
levels exceeding 1 Mrad [3]. In addition, the chip is designed to provide protection against 
Single Event Upset (SEU) in two ways. First, control memory cells are designed to be 
electronically tolerant of SEU’s. Second, the control structure and data path are configured 
to completely reset after each message insuring that an SEU of the data registers will effect 
at most one encoded message. 

A 16 bit shift register has been included on the chip with the input driven by the test 
input pin (TI) and the output driving the test output pin (TO). This test structure will 
enable the SEU immune memory cell to easily be tested under conditions of irradiation to 
verify the immunity. 

3 VLSI Implementation 

Full custom VLSI was used to achieve both circuit density and speed. The basic VLSI 
architecture implemented here is similar to a previous full custom design [4] . The additional 
features include interleaving, high speed operation (320 Mbits/sec), radiation hardened 
processing and SEU protection. 

3.1 General Organization 

Figure 1 shows a top level logic diagram. The chip consists of an encoder section and a 
test shift register. The encoder contains 32 multipliers and 32 adders which operate in 
parallel so that the mathematics required for the parity generation can be performed at 
the data input clock rate. The encoder also contains the 2048 registers (32x8x8) required 
to interleave the data to a maximum depth of 8. The 16 bit shift register is a test structure 
that will be used to verify the SEU immunity of the registers used for the control circuitry. 

Data is input on the DIO-7 pins and output on the DOO-7 pins. Input data is framed 
by the INC control signal. Output data is framed by OUTC which is a delay of INC. 
When data is input to the chip, it is presented to the parity generator and also passed 
out the output port DOO-7. At the end of the data block, INC transitions low and the 
output of the parity generator passes out DOO-7. With INC low, 0’s are input to the 
parity generator clearing out the registers. With BIC high and INC low, data flows from 
the DIO-7 to DOO-7 without being input to the parity generator providing a bypass mode. 


3.2 Parity Generator 
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Figure 2: Parity generator block diagram. 



Figure 3: Layout of constant multiplier. 

The parity generator was organized as a set of 32 slices. Each slice consisting of a multi- 
ply/add structure and a register stack for interleaving. Figure 2 shows a block diagram of 
the logic for the parity generator. Each register is a 1 to 8 bit shift register depending on 
the interleave depth set up during circuit initialization. The multiplier cell is a precharged 
exclusive or (XOR) chain. 8 of these chains form a constant multiplier. The input data 
word is multiplied by a constant, g x , programmed into the multiplier as XOR cells or 
interconnect (ZERO) cells. The XOR cell consists of 4 NMOS transistors. The ZERO 
cell is a modified XOR cell which acts as an interconnect block. The layout of a constant 
multiplier is shown in Figure 3. The multiplication constant can be programmed with a 
single mask layer defining the pattern of XOR and ZERO cells in the XOR chains. For 
maximum speed the XOR chain is precharged from both ends. The addition function is 
folded into the evaluate structure for the multiplier. The XOR cell was designed in layout 
to consume minimum area and the registers were designed to match the pitch of two XOR 
cells. Half the registers were placed above the multiplier and half below. 

Since the registers tire twice as wide as the XOR chain, the outputs of the columns in 
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Figure 4: Layout of two adjacent slice details. 

the multiplier alternate between top and bottom. The higher order nibble is output on 
one end of the constant multiplier and the lower order nibble on the opposite end. This 
requires the columns of the multiplier matrix to be rearranged such that the columns in the 
matrix are [C 0 C a C x C^C 2 C & C^Ct\. Also, in order to avoid long interconnect runs between 
registers on the top and bottom to drive the adder inputs in the multiply/add structure, 
a second slice detail was drawn such that the top and bottom sections were reversed. 
These two slice details were then alternated in the parity core allowing connection of the 
adjacent slices by abutment. Figure 4 shows the layout of two adjacent slices. There is no 
interconnect required between any of the leaf cells. The entire structure is connected by 
abutment. This maximizes the speed of operation since the interconnect capacitance has 
been minimized. 

A natural layout would place all 32 slices in a row. This would maximize the speed of 
operation and minimize the area required for the parity generator, but would result in a die 
size of approximately 10mm by 2mm. This aspect ratio would be hard to accommodate 
in packaging and the reliability of the bond wires would be in question, especially under 
the stresses expected during launch. The speed was therefore compromised by folding the 
array in half. The control was duplicated and an interconnect bank was run from the 
output of Slice 15 to the input of Slice 16. The final chip layout is shown in Figure 5. 
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Figure 5: Chip layout. 


4 Summary 


A Rad Hard, SEU tolerant implementation of the CCSDS standard RS 16 encoder has 
designed for Goddard Space Flight Center. The chip was drawn in a 1.0pm CMOS process 
and is being fabricated at Hewlett Packard’s Circuit Technology Group. The encoder 

operates at a 320 Mbit/sec data rate. ... , t 
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